A 3-Steps Algorithm for Morphological Disambiguation Using Untagged Corpora

نویسنده

  • Anna Pappa
چکیده

This article presents a three steps algorithm for morphological disambiguation between the definite article and the personal pronoun in French language. Tested accuracy in a large untagged corpora exceeds 98% with less than 1% of error. Our method has been also experimented on unlabeled Greek corpora and the results prove the system’s portability to other languages with similar structure. Not any prior knowledge is available. The rule-based procedure is robust and selfcorrecting. It can also be used as a shallow parser for verbal and nominal groups identification. The last step of the algorithm consists on the creation of a dictionary with classification of the entries in two grammatical categories : nominal and verbal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Morpho-Lexical Probabilities from an Untagged Corpus with an Application to Hebrew

This paper proposes a new approach for acquiring morpho-lexical probabilities from an untagged corpus. This approach demonstrates a way to extract very useful and nontrivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The paper describes the use of these morpho-lexical probabilities as an information source for morphological disambiguat...

متن کامل

Morphological Disambiguation in Hebrew Using A Priori Probabilities

This paper describes a new approach for morphological disambiguation in Hebrew using an untagged corpus. This approach demonstrates a way to extract very useful and nontrivial information from an untagged corpus, which otherwise would require laborious tagging of large corpora. The suggested method depends primarily on the following property: a lexical entry in Hebrew may have many different wo...

متن کامل

Partially Supervised Sense Disambiguation by Learning Sense Number from Tagged and Untagged Corpora

Supervised and semi-supervised sense disambiguation methods will mis-tag the instances of a target word if the senses of these instances are not defined in sense inventories or there are no tagged instances for these senses in training data. Here we used a model order identification method to avoid the misclassification of the instances with undefined senses by discovering new senses from mixed...

متن کامل

Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation

This paper presents a constraint-based morphological disambiguation approach that is applicable languages with complex morphology-specifically agglutinative languages with productive inflectional and derivational morphological phenomena. In certain respects, our approach has been motivated by Brill's recent work (Brill, 1995b), but with the observation that his transformational approach is not ...

متن کامل

Evaluation of a possibilistic classification approach for Arabic texts disambiguation (Evaluation d'une approche de classification possibiliste pour la désambiguïsation des textes arabes) [in French]

Morphological disambiguation of Arabic words consists in identifying their appropriate morphological analysis. In this paper, we present three models of morphological disambiguation of non-vocalized Arabic texts based on possibilistic classification. This approach deals with imprecise training and testing datasets, as we learn from untagged texts. We experiment our approach on two corpora i.e. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003